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ABSTRACT 


The Air Force Space Command (AFSPACECOM) uses an analytic satellite motion 
model based on the Brouwer theory, referred to as SGP4, for near-earth objects, and 
another satellite motion model based on the work of Hujsak, referred to as SDP4, for 
deep-space objects. These models assist in tracking over 7000 objects around the Earth 
each day. Also, the original satellite motion model used by the AFSPACECOM based 
on the work of Kozai and Brouwer, known as the Simplified General Perturbations 
model or SGP, is an analytic model still being used at several sensor sites. Based on the 
increasing number of space objects that require tracking and the desire for increased 
accuracy, there 1s a growing need to reduce computation time in implementing satellite 
motion models. Parallel computing offers one method to achieve this objective. This 
thesis investigates the parallel computing potential of SGP, SGP4, and SDP4 using the 
Intel 1PSC/2 hypercube multicomputer. This thesis chooses a parallel algorithm and 
applies it to SGP, SGP4, and SDP4 for implementation on the hypercube, and reports on 
the potential reduction in computer time by first considering only the calculational 
portion of the algorithm and then including the effects of the reading and writing 
portions. A diskette containing the Fortran software developed is available upon request 
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L INTRODUCTION 


The first satellite motion model used by the Air Force Space Command 
(AFSPACECOM) is known as the Simplified General Perturbations (SGP) satellite 
motion model. SGP is an analytic model based on the work of Kozai (1959) and 
Brouwer (1959), made operational by Hilton and Kuhlman (1966). It is implemented by 
a Fortran subroutine, SGP, and is used to track near-Earth satellites (period less than 225 
minutes) having orbits with low eccentricities. By 1976 a second satellite motion model 
for near-Earth satellites was implemented by a Fortran subroutine, SGP4, replacing SGP 
as the operational satellite motion model used by the AFSPACECOM. SGP4 is an 
analytic satellite motion model based on the theory developed by Lane and Cranford 
(1969) which uses the solution of Brouwer (1959). To meet the needs for deep-space 
satellites another model, SDP4, was made operational. SDP4 is an extension of SGP4 
where the deep-space theory is derived from Hujsak (1979). Currently, SGP4 refers to 
one Fortran subroutine that has combined the former separate subroutines SGP4 and 
SDP4, and is what the AFSPACECOM now uses to generate the North American 
Aerospace Defense and Command (NORAD) element sets. 

The AFSPACECOM currently tracks approximately 7000 space objects on a daily 
basis. As space technology progresses and as the dependence upon applications in space 
increases, the number of space objects that require tracking will continue to grow. The 
future use of Operational satellite motion models used by the AFSPACECOM need to 
involve shorter computational time to obtain near real time orbit predictions. Also, if the 


AFSPACECOM chooses to develop models with greater accuracy, these models will 


undoubtedly require even more computational time. Such accurate models can be used 
to predict orbits of smaller objects, increasing the number of objects by ten fold. 

Computational time in implementing a satellite motion model is highly dependent 
upon the computer configuration used. Parallel computing has the potential to 
significantly decrease computational time over serial computing currently used by the 
AFSPACECOM. Use of parallel computers has already proven to be beneficial in 
reducing computation time in many other applications. Once a decision is made to 
convert to parallel computing, future satellite motion models can be developed in a 
parallel architecture format. Ideally, these models made operational with the use of 
parallel computers will result in computationally efficient programs that yield highly 
accurate results. 

This thesis investigates the parallel computing potential of the AFSPACECOM's 
first operational satellite motion model, SGP, and the current satellite motion models, 
SGP4 and SDP4, using the Intel iPSC/2 hypercube multicomputer. The SGP model was 
included since it is still used at several surveillance sensor sites. The following chapter 
provides a description of the SGP model and outlines the algorithm used by the Fortran 
subroutine, SGP. Chapter III gives a general description of the SGP4 and SDP4 models 
and how they compare with the SGP model. Chapter IV presents an overview of parallel 
processing and discusses the methods to parallelize a serial algorithm to be applied to the 
hypercube. In Chapter V, the parallelization of SGP, SGP4, and SDP4 are presented 
with their respective success in reducing computation time. Here the effects of reading 
and writing to and from the disk are not included. A comparison between the 
parallelized versions of SGP, SGP4, SDP4 and PPT2 is also included. PPT2 is an 
analytic satellite motion model used by the Naval Space Surveillance Center 
(NAVSPASUR), and a parallelized version was created by CPT Warren E. Phipps 
(1992) at the Naval Postgraduate School using the same multicomputer, Intel iPSC/2 


hypercube. Chapter VI further investigates the parallel computing potential of SGP4 and 
SDP4 by considering the effects of increased computation time and the effect of reading 
and writing to and from the disk. The last chapter of this thesis provides conclusions and 


suggestions for future research. 


ll. SIMPLIFIED GENERAL PERTURBATIONS (SGP) MODEL 


A. OVERVIEW 

The Simplified General Perturbations model is the original model used by the 
AFSPACECOM to track artificial satellites orbiting the earth, implemented by the 
subroutine SGP. Given pseudo elements generated by the AFSPACECOM, SGP users 
Can predict the state vector (position and velocity) at a future time. This model considers 
perturbing accelerations due to the oblateness of the Earth, asymmetry of the Earth's 
mass about the equatorial plane, and atmospheric drag. This model does not include 
perturbation effects due to longitudinal variation in the Earth's gravitational potential, or 
due to other celestial bodies such as the moon or the sun. 

Satellite motion models can be classified according to how the equations of motion 
are solved. The two general classifications are known as general perturbations and 
special perturbations. General perturbation models involve solving the equations of 
motion analytically. As more perturbations are included in a satellite motion model to 
increase accuracy, analytical solutions become increasingly complicated if not impossible 
to solve. On the other hand, it is always possible to use special perturbations. Special 
perturbation models involve solving the equations of motion through numerical 
integration. Although special perturbation models are generally more accurate and less 
complicated than general perturbation models, they are computationally expensive. A 
third model type known as a semi-analytic model combines the general and special 
perturbation techniques in an attempt to balance the advantages and disadvantages of the 
two techniques, resulting in a highly accurate and computationally efficient model. The 


SGP model is an example of a general perturbations model. 


Satellite motion models can also be classified according to how the variation in the 
satellite's motion is represented. The two general classifications are known as variation 
of elements and variation of coordinates. Variation of elements identifies variations in 
terms of changes in the osculating elements with respect to time. Variation of 
coordinates involves choosing a coordinate system and describing the changes in position 
and velocity with respect to time in that coordinate system. The SGP model uses 
variation of elements and describes the changes in the classical orbital elements with 


respect to time. 


B. THEORY 

The Simplified General Perturbations (SGP) model is an analytic model developed 
by Hilton and Kuhlman (1966). SGP's gravitational submodel is a simplification of the 
work done by Kozai (1959) and Brouwer (1959). The atmospheric drag submodel 
describes the mean motion as a linear function of time. 

Before proceeding with the theoretical discussion of these models, a bnef 
explanation of how SGP defines satellite motion is needed. The variation in the motion 
due to the perturbation effects is described by the changes in the classical orbital 


elements with respect to time. The six classical elements are n, e,i,02,@, and M (see 


Figure 2.1), where 


n = mean motion 

é = eccentricity 

i = inclination of the orbital plane to the equator 
Q2 = right ascension of the ascending node 

@ = argument of perigee 

M = mean anomaly at epoch. 


The position and velocity vectors can be obtained from these six orbital elements. 





Figure 2.1 Classical Orbital Elements 


'E represents the eccentric anomaly and is related to the mean anomaly by, 


E-esinE=M. 


1. Kozai's Gravitational Model 
Here a simplification of Kozai's model is presented (Roy, 1988, pp. 312-320). 
Kozai's model is more involved than the classical Keplerian model for idealized satellite 
motion. In the classical model the Earth's potential at an external point, a distance r 


from its center, is expressed by 


ij (2.1) 
r 


where . is the gravitational parameter, equal to the product of the Earth's mass and the 
gravitational constant G. Thus, a satellite of mass m at a distance r_ possesses a potential 


energy equal to 


—-mU = Fe 
a 
Equation 2.1 represents the potential for a perfectly spherical Earth. In Kozai's 
model, the Earth is considered to be a planet that is symmetric about the spin axis and 


asymmetric about the equatorial plane . Thus, the Earth's potential may be wnitten as 


u=Hh-$4,(2) Pind) (2.2) 
r na? r 


where J, are constants, R is the Earth's equatorial radius, 6 is the declination, and 
P. (sind) is the Legendre Polynomial of order n in sind. 
The Earth's potential can be written in terms of a disturbing function F 
U=U,+F 
where U, represents the classical potential. Thus the disturbing function can be written 


as 


F =U-U, -v-H--by (2) P, (sin8). (2.3) 
ii r r 


n=2 


Now, J, is of the order 107, and JagU 4; cate OF OLdEL 10~ or less. Since 


made R)’ | 
J,,J5,... are relatively insignificant and the (= factor decreases as mn increases, 
r 


further analysis will only include J, and J,, the second and third harmonics, in 


agreement with the SGP model. Thus, the disturbing function can be written as 


2 3 
F= “1(8) sin 5-3) +4 (= Ssin 5-3). (2.4) 
r r 2 2 r Z 2 


Substituting the known relations, 


7 a(1—e?} 
~ 1l4ecosv 


and 
sind = Sinisin(v+@), 
where v is the true anomaly and 6 is the declination , as illustrated in Figure 2.1, F 


becomes 


r cote 


J,R° (ay 
= x (=| | (Bsns-2)sinco+0)~Ssin'isinaco +0) Join 


JR’ (a) 
rust (2) 3 — Sein? +5 sin? icos2¢0+ 0) 


(2), 
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Next, the disturbing function is separated into first-order secular, second-order 
secular, long-period, and short-period terms. These parts are referred to as F,, F,, F,,and 


F, respectively. Now, F is periodic with respect to the true anomaly v and the radial 


distance r, The true anomaly »v and the mean anomaly M are related by 


B22 2), 2.6) 


Since the quantities ” and v are functions of e and M only, F is periodic with respect 
a 


to M. Terms in F depending on M and not on @ are short-period. Terms in 
F depending on @ but not on M are long-period. The resulting terms depending neither 


on M nor on @ are secular. Thus, the terms of F are sorted in the following way: 


F,: These terms are obtained by averaging with respect to M those parts of the first 
portion (i.e., J, part) of F that are independent of M and w. 


F,: These terms would be obtained by averaging with respect to M those parts of the 
second portion (i.e., J, part) of F that are independent of M and a, but since there are 
no such terms F, is zero. 


F,: These terms are obtained by taking the mean value of the second portion of F 
with respect to M. 


F,: These terms are obtained by subtracting the first-order secular terms from the first 
portion of F. 


The mean values are derived by 


_ ears 
O=5, |oam 


where Q represents any term in F that is being averaged, and the equations, 





ay 17a) = 

(<) = (2) dM =(1-e?)2 

r FRG ANG 6 

(= sin2v=( £] cos2v = 0 (2.7) 
r r 

a\ = 

(+) cosy = e(1-e?) 2 

‘ 


by Tisserand (1889) are used to obtain F,, F,, F,, and F,. The resulting equations are 


F, =0 (2.8) 





+5sin?jeos2(v+<)| 


2. SGP's Gravitational Model 
This model follows directly from the theory discussed in part one. It will be 
presented based on Project Space Track Report No. 3 (Hoots and Roehrich, 1980), 
maintaining a parallel structure between the equations and the computer code. 
a. Calculating Initial Constants 
Predictions are made by first calculating the constants below: 


(1) Semi-major axis. This is expressed by 


2 
a, = Fal (2.9) 


where 
a, = Semimajor axis 


k= 
n, = SGP type “mean" mean motion at epoch. 

(2) Deltal (6,). This is an intermediate calculation in modifying the semi- 
major axis to include perturbation effects due to the oblateness of the Earth. Deltal is 
expressed by 

Ze a (3cos? i - 1) 


aa (oe (2.10) 


3 


ee er 
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where 


a, = equatorial radius of Earth 


i =" mean" inclination at epoch 


€, ="mean" eccentricity at epoch. 
Deltal will be revealed in the next part. 
(3) Modified Semi-major Axis (a,). This is expressed by 
l 2 134.3 
=a |1-—6, -6,, -— 4, |. oan 
ie oS ee | a 
Equation 2.11 is derived in the following manner: 


By substituting F, for F in the differential equation (Roy, 1988, p. 316) 





the mean "mean" anomaly at epoch M, is obtained, 
M,= M, aris 


where M, is the unperturbed mean anomaly, with 


where n, is the unperturbed mean motion. For convenience let 


3 ae ee 
D= Fae —“é___(3¢0s a 1). 


3 
ae 
l-e }? 


oO 


Now, a, is chosen by convention to be 


a, =4,(1- Da,” | 


where a, is the unperturbed semi-major axis. Thus, using the relation 


3~ 2 2 


leads to the equation 
an,’ =k,?(1- Da,*) (14+Da,7).. 


Oo Qo 


Therefore, 





1 
a= e } (1- pa,?)(1+Da,2)> = a,(1-Da,7)(1+Da,")>. (2.12) 
n 


Equation 2.12 defines a, implicitly. This equation is solved iteratively by using Picard's 
method, where a, (see Equation 2.9) is used as the initial approximation (Office of 
Astrodynamic Applications HQ Fourteenth Aerospace Force, 1974, pp. 1-6), resulting in 
Equation 2.11. It should be noted that in practice, n, is obtained through a differential 
correction process on a Separate computer code (i.e., the main program DRIVER reads 
this quantity as part of the NORAD 2-line element set), and is accurate to the third order 
(in J,), thus a, is calculated to the same accuracy. 
(4) Semilatus Rectum (p,). This is expressed by 
p, =a\1=¢,)} (2.13) 
The semilatus rectum is the length of the chord from the force center to the ellipse, 
parallel to the minor axis bb’, as illustrated in Figure 2.2. 
(5) Radius at Perigee (g,). This is expressed by (see Figure 2.2) 
q, = 4,(1-e,). (2.14) 
(6) Mean "Mean" Longitude at Epoch (L,). This is the sum of the "mean" 
mean anomaly at epoch (M,), the "mean" argument of perigee at epoch (@,), and the 
“mean” longitude of ascending node at epoch. Thus, L, can be written as 


L,=M,+@,+Q.,. (2.15) 
(7) Change in the "mean" longitude of ascending node with respect to 


dQ dQ : , 
time = . The equation for a can be derived from the relation (Roy, 1988, p.316) 
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Figure 2.2 Semilatus Rectum and Radius at Perigee 





a2 an 
dt n.d, yi- e,. Sint, di, | 
by substituting F, from Equation 2.8 for F in Equation 2.16 yielding 
dQ_ 3, a, | 
armen =n, COSI,. (2.17) 


0 


(8) Change in the "mean" argument of perigee with respect to time (2) 
t 


: dw 
The equation for Pa can be derived from the relation (Roy, 1988, p.316) 


do __ cosi, OF | vines. OF 
at na, yl-e, Sin i, di, nd, €, de, 


by substituting F, from Equation 2.8 for F yielding 








2 
a : 
de aot pF Mel Sos" L, =)! (2.18) 


b. Secular Effects of Atmospheric Drag and Gravitation 
The atmospheric drag is taken to effect mean motion linearly with time, 
resulting in a quadratic variation of mean anomaly with time. The drag effect on 
eccentricity is modeled based on the assumption that perigee height remains constant. 
The following analysis includes the secular effects of atmospheric drag and gravitation: 
(1) Modified semi-major axis (a) to include atmospheric drag. This is 


expressed by 


6 — 2 19) 


where t—f, is the time since epoch. Equation 2.19 is derived from equation 2.16 which 


gives 


where n 1s approximated by a Taylor series expansion to the first order, 


n 
n=n, +2) — |\t-t,). 
1+ \(r-1,) 
It should be noted that the term - is generated on a separate computer 


code which uses a differential corrector. The main program DRIVER reads this quantity 
as part of the NORAD 2-line element set. 


(2) Modified eccentricity (e). This is expressed by 


q 
1-—, > 
Pe pee (2.20) 


ley Vora 
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Since the perigee radius is assumed constant, the resulting secular correction applied to 
the eccentricity may produce negative values for that element on near-circular orbits 
(Schumacher, 1991, pp. 106-107). A negative eccentricity is undesirable operationally. 
Thus, SGP arbitrarily resets these negative eccentricities to a small positive value 
(10~). 
(3) Modified semilatus rectum. This is expressed by 
p=all-e’). (2.21) 
(4) Modified "mean" longitude of ascending node (a, ). This is expressed 


by 
Q, Eig er) (222) 
‘ at 
(5) Modified "mean" argument of perigee (w, ). This is expressed by 
QW, = ey (2.23) 
J at 
(6) Modified mean "mean" longitude (L, ). This is expressed by 
dw dQ n 2 
bot, ene NP te 2) 2.24 
<1, +(n, +42 Vi ,)4 80-1) 2.24 


Equation 2.24 holds since 


dM , 
———_ — ? 
d(t-t,) 
where, M, may be approximated by 
2 


M,. =n,1, +n, (t=1,)+5A,(0=1, ) =M, +n, (t=1,) +>, (0-1) (2.25) 


Therefore 


yields the desired results, after substituting Equations 2.22, 2.23, and 2.25, and using the 
relation 


L,=M,+@,+2,. 


c. Long-period Perturbations 
Long-period perturbations are included as follows: 


(1) Vertical component of the eccentricity vector with respect to the line of 
nodes vector (a, yst I) This is expressed by 


aaa 
Qysp = eSIND, a) J 





sini, =e, sin@ (2.26) 


where e, and ® represent the resulting eccentricity and argument of perigee (see Figure 


2.3). Equation 2.26 is derived from Brouwer (1959) for long period perturbations. 
(2) The modified mean "mean" longitude (L). This is expressed by 


" 
ES] Ee yy [pea (2.27) 
4 J,p 1+ cosi, 
where 
A.wnsp =, COS (2.28) 


represents the horizontal component of the eccentricity vector with respect to the line of 
nodes vector (see Figure 2.3). Equations 2.27 and 2.28 are derived from Brouwer 
(1959) with Lyddane (1963) modifications for low eccentricities or inclinations. 


(3) Kepler's equation. Kepler's equation is solved iteratively for the sum of 


the eccentric anomaly and the resulting argument of perigee (E+) where 


(E+), =(E+@) +A(E+o) (2.29) 
with 
NORRIE U — aysr Oy: + Ans, SINCE +@), — (E+); . (2.30) 
—Gyysp SIN(E + M); — Ayo, COS(E + @); +1 
U=L-Q, 
and 
(E+), =U. 


Equation 2.30 is derived from the relation (Roy, 1988, p.85) 
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Figure 2.3 Eccentricity Vector and Node Vector 


fm Seed Cy SIN ee Ce +O). 
NCR) 
i—¢ cos E. 


since 


e, SINE; =@ys, SiIN(E + @); — Gyo, COS(E +O); 


and (2.54) 


€, COSE: = Ayo, SIN(E + @); +A,ys5, COS(E + @);. 


(4) Modified eccentricity. The square of the eccentricity can be expressed by 


’) 2 
€,” = (aus, ) + (aus: ) - (2.32) 
This equation simply comes from Equation 2.31. 


(5) Modified semilatus rectum. This is expressed by 


PL= pile) 


(6) Radial distance from Earth to satellite. This is expressed by 


~ a(l—e, cosE). 


(7) Time rate of change of the radial distance. This is expressed by 


r=k, va sin E. 
: : 
This equation is derived by differentiating Equation 2.34 giving 


ar _ dE 
— =ae, sinE— 
at dt 


where (Danby, 1962, p.131) 


dE_ k, l ky a_ ky 


dt ~ Jaq? 1-e, cosE e, cOSE ar ae 











(2.33) 


(2.34) 


(2.35) 


(8) Product of radial distance and time rate of change of the speed of the 


Satellite, v. This is expressed by 


ii 


r 


r=k, 


This equation 1s true since (Danby, 1962, p.130) 


dv ky A = BPs, 


=(1+ecosv) 


dt Tre 


(9) Argument of perigee (@). This is expressed implicitly by 











e, sin E 


a} . 
J 1+ Jl+e, 


(2.36) 


(2:37) 


(2.38) 


and 


e, sin E 


a 
COSk = — cos(E + @)—a,ys; +a AS = ie? (2.39) 
r geen 


where 
u=V+t+O. 


Equation 2.38 is derived algebraically using the relations 


sinu = SiN VCOS@ + COS VSiIN W 


* a e 
sinv =—vVl-e’ sinE 
r 
ée, -cosE 
CS) = 
€,cosE—1 


combined with Equation 2.31. Equation 2.39 is derived similarly. Now, u can be 


calculated from 





u= an”( aes } (2.40) 
coSu 
d. Short-period Perturbations 


The short-period perturbations are now included by 
2 


(ae ean cos (2.41) 
4° PL 
lee 

(2) u, =u-=J, “£-(7 cos? i, -1)sin 2u (2.42) 
8 Pr 

3)0.=2 3 j ae Oa 

(3) Q, =Q, i 2 > Cosi, sin 2u (2.43) 
3. a, 

(4) i, =i, +—J, > sini, cosi, cos2u. (2.44) 
4 Pr 


These short-period perturbations are derived from Brouwer (1959). They are 


the result of taking Brouwer's equations for short-period perturbations and dropping 
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terms with a coefficient of k,e or smaller (Hoots, 1975, p . 9). 
e. Resulting Position and Velocity Vectors 


The unit orientation vectors are calculated by (see Figure 2.4) 


U= Msinu, +Ncosu, 
V = Mcosu, —Nsinu, 


where 
M, =-sinQ, cosi, 
M =, M, =cos22, cosi, 
M, =sini, 
Ny, =cosQ, 
N= 4 N, =sinQ, 
N, =0 
The position and velocity vectors are then given by 
r=r,U 
and 
r=rU0+(n)V. 





Figure 2.4 Unit Orientation Vectors 
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(2.45) 


(2.46) 


(2.47) 


(2.48) 


IL SGP4 AND SDP4 SATELLITE MOTION MODELS 


A. OVERVIEW 

The SGP4 and the SDP4 models are the satellite motion models currently used by 
the AFSPACECOM. Given the six mean orbital elements, an epoch reference ume, and 
a drag factor, SGP4 or SDP4 can predict the position and velocity at a future ime. If 
the satellite is "near-earth" (i.e., orbit period less than 225 minutes), SGP4 propagates the 
ephemerides, otherwise the "deep-space” satellites are propagated by SDP4. Like SGP, 
SGP4 considers perturbing accelerations due to the oblateness of the Earth, asymmetry of 
the Earth's mass about the equatorial plane, and atmospheric drag. The differences 
between SGP and SGP4 are that SGP4 includes zonal harmonics through J, whereas 
SGP only includes through J,, SGP4 includes some "deep-space" terms not found in 
SGP, and SGP4 includes a drag force in its equations of motion where SGP performs a 
simple correction technique to include the effects of atmospheric drag. The SDP4 model 
includes the same perturbing effects as SGP4, but also includes effects due to the 
gravitational attraction of the sun and moon, and some sectoral and tesseral harmonics 
derived from the longitudinal variations of the Earth's gravitational potential. 

In Chapter II classification of satellite motion models was discussed. Like SGP, the 
SGP4 and SDP4 models solve the equations of motion analytically, and the variation in 
the satellite's motion is in terms of the changes in the osculating elements with respect to 
time, thus SGP4 and SDP4 are general perturbation models that use variation of 


elements. 


pa 


B. THEORY 
1. SGP4 Model 

Here a general discussion is given as presented by Paul Schumacher (1991). 
SGP4 is based on the theory of Lane and Cranford (1969) which uses the work of 
Brouwer (1959) and Brouwer and Hori (1961). Brouwer and Hori generalized the 
Original drag-free Brouwer model by including a drag force in the equations of motion. 
This involved modeling the atmosphere as a spherically symmetric, and non-rotating 
entity. The density is assumed to decrease exponentially with altitude, providing two 
free parameters (reference density and scale height) which can be chosen to represent the 
real atmosphere over some altitude range. This atmospheric model is combined with a 
gravitational force model which includes zonal harmonics J,,J,, and J,. The equations 
of motion are written in Delaunay variables and treated by von Zeipel'’s method. 
Unfortunately, this model was discovered to be very inaccurate for satellites with low 
perigee altitudes, exactly where the satellite is nearing decay. 

The work of Lane (1965) and Lane and Cranford (1969) attempted to overcome 
the problems of inaccuracy presented by the Brouwer and Hori model. As a possible 
improvement to the exponential density model, Lane assumed that the scale height 
parameter varies linearly with altitude. This assumption led to a new description of the 
density, namely, an "almost-power" function (Fitzpatrick, 1970). This function involves 
three parameters: reference density, reference scale height, and scale height gradient. 
The gravitational model includes zonal harmonics J,,J,,J,, and J;. Included in the 
solution are complete first-order short-periodic and secular terms, some second-order 
secular terms, and all first-order long-periodic terms, except for the mean anomaly which 
only includes drag-related terms. Singularities resulting from an eccentricity or 
inclination equal to zero are avoided by Lyddane's approach (1963) of replacing 
Delaunay variables with Poincare variables. In 1971 Cranford established these results 
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as the SGP4 theory, and by 1976 SGP4 had replaced SGP as the operational theory at the 
AFSPACECOM. Like SGP, SGP4 uses similar compromises in its implementation. All 
J, terms, all O(10~’) short-period terms and several long-period terms are not included 
in the solution. 

The satellite drag density is measured by the modified ballistic coefficient, B’. 
This coefficient is a product of satellite drag coefficient, area-to-mass ratio and a 


combination of density profile parameters. The modified ballistic coefficient, B’, is 


obtained from a differential correction process and is one of the input parameters to 


SGP4. It is somewhat analogous to the = parameter used for SGP. Due to the 


implementation compromises and the difficulty in acquiring accurate density values from 
a simple model, SGP4 is considered inaccurate for rapidly decaying satellites. 
2. SDP4 Model 

Here, again, a general discussion is given as presented by Paul Schumacher 
(1991). SDP4 includes most of the SGP4 implementation, but also includes some deep- 
space perturbations developed by Hujsak (1979). As mentioned previously, SDP4 only 
applies to satellites with period greater than 225 minutes. This model includes leading 
terms for lunar and solar attraction, as well as several longitudinal dependent Earth 
geopotential harmonics. Secular lunar and solar perturbations are obtained by averaging 
over the satellite period, and the remaining short-period terms are neglected. 
Singularities are found in the secular formulae at zero values of inclination and are dealt 
with by omitting sensitive terms when the inclination is relatively small. A second 
averaging Over the periods of the moon and sun respectively provides additional secular 
terms and some short-period terms. Earth geopotential resonance corrections are used 
only for in-track position. Although SDP4 has its shortcomings in its calculations of the 


perturbations, it is the only "deep-space" analytical model that is used operationally. 


a: 


IV. PARALLEL COMPUTING 


This chapter closely follows the parallel computing chapter written by Warren E. 


Phipps in his thesis on the parallelization of the NAVSPASUR model, PPT2 (1992). 


A. OVERVIEW 
1. Definition 

As scientific models become increasingly more complex and detailed, the 
computer requirements in terms of amount of computation and speed become more 
demanding. Computer engineers have responded by taking two approaches to achieve 
faster performance. 

The first approach is to increase the speed of the circuitry. Although great 
advances have been made in this area, the speed is bounded by the speed of light. Also, 
the design and manufacture requirements for continued increases in speed are very 
costly. The second approach is the use of parallel computing. Parallel computing or 
parallel processing provides an alternate means to achieve faster computer performance 
at a more affordable price. In this new field, various definitions in how to define parallel 
computing exist. One definition which describes parallel computing most completely 


and concisely may be taken from (Hwang, 1984, p.6): 


Parallel processing is an efficient form of information processing which 
emphasizes the exploitation of concurrent events in the computing process. 
Concurrency implies parallelism, simultaneity, and pipelining. Parallel events may 
occur in multiple resources during the same time interval; simultaneous events 
may occur at the same time instant; and pipelined events may occur in overlapped 
time spans. These concurrent events are attainable in a computer system at various 
processing levels. 


In his book, Hwang describes the processing levels. The top level is referred to as 
the program level which involves executing multiple programs by means of 
multiprogramming, time sharing, and multiprocessing. This depth of parallel processing 
is beyond the scope of this thesis. The lower levels (i.e., task level, inter-instruction 
level, and intra-instruction level) involve the execution of a single program which are 
applicable to the programming done for this research. Thus, for the purpose of this 
thesis, parallel computing is defined as the efficient form of information processing 
emphasizing the concurrent computations and manipulation of data to solve a single 
problem. 

2. Classification of Parallel Computers 

a. Type Classifications 

Implicit in the definition of parallel computing are three methods to achieve 
parallelism: temporal parallelism, spatial parallelism, and asynchronous parallelism 
(Hwang, 1984, p. 20). These methods provide a means to classify the various types of 
parallel computers. 

The first type is a pipeline computer. Pipeline computers perform overlapped 
computations , thus use temporal parallelism. Computations are broken into a number of 
segments or stages where the output of one segment is the input of another segment. If 
all the segments work at the same speed, the work rate of the pipeline is equal to the sum 
of the work rates of the segments, once the pipe is full. This is analogous to an assembly 
line in a factory. An example of a pipeline computer is a Cray-1. 

The second type is a processor array. A processor array is a set of identical 
synchronized processing elements to achieve spatial parallelism. It is capable of 
simultaneously performing the same operation on different data. An example of a 


processor array is the Connection Machine. 
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The third type is a multiprocessor. The term multiprocessor refers to a shared- 
memory multiple-CPU computer designed for parallel processing. These CPU's or 
processors are capable of performing independent operations and may achieve 
asynchronous parallelism (i.e., the processors have the ability to work with the most 
recently available data). An example of a multiprocessor is the Cm* of Carnegie-Mellon 
University. 

The final type is a derivative of the multiprocessor, the multicomputer. The 
multicomputer is a multiple CPU computer designed for parallel processing without the 
shared memory. Multicomputers can also achieve asynchronous parallelism through 
communications between the processors or nodes. An example of a multicomputer is the 
INTEL iPSC/2 hypercube. Since each processor has its own memory and can perform 
independent operations, multicomputers offer a greater degree of freedom in 
programming. However, communication between the nodes may require that 
synchronization be programmed into the multicomputer code to achieve the objective of 
the code. 

The four type classifications are not necessarily mutually exclusive. For 
example many commercially available processor arrays, multiprocessors, and 
multicomputers use pipeline processors to complete operations such as vector processing. 

b. Architectural Classifications 

Parallel computers can also be classified according to their architecture. One 
such classification was devised by Flynn (1966). He categorized digital computers by 
the multiplicity of hardware used to manipulate instruction and data streams. Four classes 


of computers resulted from this: 


1. Single-Instruction stream, Single Data stream (SISD). Most serial computers fall 
into this category. Although instructions are completed sequentially, this category 
includes overlapping instructions (pipelining). Therefore, pure pipeline processors also 
belong to this category. 
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2. Single Instruction stream, Multiple Data stream (SIMD). Processor arrays fall into 
this category. The processor array receives a single set of instructions, but each element 
receives and manipulates its own set of data. 


3. Multiple Instruction stream, Single Data stream (MISD). No current computers 
fall into this category. 


4, Multiple Instruction stream, Multiple Data stream (MIMD). Most multiprocessors 
and multicomputers fall into this category. The INTEL iPSC/2 is a MIMD machine. 


c. Topological Classifications 
Another classifying scheme for parallel computers that only apply to processor 
airays, multiprocessors, and multicomputers concerns the topology of the inter-processor 
connections. These connections are the means through which processors can 
communicate with one another. Seven important processor organizations are the mesh, 
the pyramid, the fat tree, the butterfly, the shuffle-exchange, cube-connected cycles and 
the hypercube. Figure 4.1 show examples of the first six processor organizations 
mentioned. Figure 4.2 shows examples of the hypercube topology. This thesis will 
obviously emphasize the hypercube topology. For a more complete discussion on the 
other topologies see (Quinn, 1987, pp. 25-30). 
3. Measurements of Performance 

The objective of parallel computing is to provide faster computation, thus certain 
defined parameters are needed to measure the performance of the parallel computer 
versus the serial computer. Computation speeds depend on many factors such as 
hardware design, technical specifications of the computer components, and the design of 
the algorithm to execute the computations. Two common measures of effectiveness that 
account for both the hardware and the algorithm design are speedup and efficiency. 


Speedup, S,, is defined as the ratio between the time needed to execute the most 


efficient sequential algorithm to perform a set of computations, 7,, and the time to 


perform these same computations using parallelism, i 
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Figure 4.2 Hypercubes of dimension zero through four 
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S.= = (4.1) 


It should be emphasized that even though the serial program and the parallel program 
execute the same set of computations, the two programs do not necessarily follow the 
same algorithm. Parallel programs often include additional instructions or operations to 
accommodate parallelism. Also, as previously mentioned, the serial program should be 
the most efficient obtainable, so as not to be misleading in the calculation of speedup. 
Many suggest that the times, 7, and 7;, be measured using a specific parallel computer 
and the fastest serial computer. But, the variation in the technical specifications between 
the two computers make the comparison unclear, thus do not provide an effective means 


of assessing the effectiveness of parallel computing. Therefore, for the purpose of this 


thesis, S, is defined as the ratio of time, 7,, taken by the parallel computer to execute the 


most efficient serial algorithm and the time, 7,, taken by the same parallel computer 


executing the parallel algorithm using p processors, 


S,=—. (4.2) 


The other measure, efficiency, E,, is defined as the ratio of the speedup to the 


number of processors, 


Ea (4.3) 


Efficiency accounts for the relative cost in terms of number of processors required in 
achieving a certain speedup. 

Many factors can limit the possible speedup and efficiency of a parallel program. 
These factors include the number of sequential operations that cannot be parallelized, the 
communication time among processors, and the time each processor is idle due to 


synchronization requirements. Many have argued that these factors severely limit the 
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benefits of parallel computing. Regardless of these factors, research demonstrates 
parallel computing as an effective means to reduce computation time (Quinn, 1987, pp. 
18-20). If one considers only the number of sequential operations in a program that 
cannot be parallelized, Amdahl's Law indicates that the maximum speedup achievable by 


p processors is, 


Ce oe 
Pf t+(l-f)/p 


where f is the fraction of operations that must be performed sequentially (Amdahl, 


(4.4) 


1967, pp. 483-485). Equation 4.4 provides an initial means to see if a given algorithm is 


a good candidate for parallelization. 


B. INTEL iPSC/2 HYPERCUBE 

The design of a parallel algorithm is done with a specific parallel computer in mind. 
This design involves maximizing the speedup and efficiency. In determining the parallel 
computing potential of the AFSPACECOM satellite motion models, SGP and SGP4, an 
INTEL iPSC/2 hypercube computer, located at the Department of Mathematics at the 
Naval Postgraduate School, was used. This computer is a MIMD multicomputer with a 
hypercube topology. It consists of a system resource manager called the host, and eight 
individual processors, referred to as nodes. The host is a 386-based computer, which can 
process data as well as providing an interface for the user. 

The nodes are complete and independent INTEL 80386 microprocessors. Each 
node also contains a 80387 numeric coprocessor, its own local memory, and a Direct- 
Connect Communications Module (DCM). Each node may also contain a Vector 
Extension (VX) module for pipelined vector operations. The iPSC/2 located at the Naval 


Postgraduate School has only one node that contains the VX module. 
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Communications among the nodes and host are done with message passing. The 
DCM permits messages to be sent from an individual node to a receiving node directly 
without disturbing the remaining nodes. Each node along the message path can be 
thought of as a switch which simply closes when a message is sent. Other hypercube 
designs require messages to be stored and forwarded with each node along a message 
path until the message is received by the destined node. 

The iPSC/2 uses a UNIX operating system and the available programming languages 
are Fortran or C. For a more detailed listing of the technical specifications of the 


INTEL iPSC/2 hypercubes see Appendix A. 


C. METHODS OF PARALLELIZATION 
1. Vectorization 
Vectorization is the process of converting blocks of sequential operations into 
vector instructions that may be pipelined. A simple example of vectorization using 
Fortran is the following: 


Sequential Code: 


Dol10 i=1,N 
1Q) z@)=x)+yQ) 


Vector Code (VAST2): 
call vadd(N,x,1,y,1,z,1) 
VAST2 is an example of a vectorization compiler to assist in the vectorization of a serial 
program. It is used by the iPSC/2 at the Naval Postgraduate School. There are many 
other vectorization compilers that are also available commercially. (Quinn, 1987, pp. 
233-235) 
Vectorizing compilers automatically vectorize serial codes before execution. 


Some vectorizing compilers may even indicate to the user areas of the program that limit 
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potential vectorization. Unfortunately, vectorizing compilers are not solely capable of 
Maximizing vectorization. Most vectorizing compilers are restricted in their ability to 
identify sequential blocks that can be vectorized and the resulting vectorization may not 
always be straight forward. The VAST2 compiler, for example, supports only Fortran 
programs and is only able to vectorize do loops and if statements. (iPSC/2 VAST2 
User's Guide, 1989) 

2. Distributing Computations 

Vectorization illustrates the first level of parallelism, in that tasks are parallelized 
on individual processors. On the other hand, to distribute partitions of a program to 
different processors on a multicomputer another approach is needed. Compilers created 
to achieve this higher level of parallelism have not been successfully implemented as 
with the vectorization compilers. Thus, the task of efficiently parallelizing an algorithm 
is left to the user. 

Since the performance of parallel algorithms depend on factors such as processor 
speed, memory access time, and memory capacity, in other words, the technical 
specifications of the parallel computer, the process of parallelizing an algorithm must be 
done with a particular parallel computer in mind. The multicomputer where each node 
has its own memory presents the greatest flexibility to the user. Here, the user has to 
partition the problem among the processor nodes. The hypercube topology allows the 
user to use the natural topology of a given problem to divide the problem into parallel 
processes. A process is defined as a single statement or a group of statements which are 
a self-contained portion of the total computations. For the INTEL iPSC/2, two 
decomposition strategies are suggested: Control Decomposition and Domain 


Decomposition. (iPSC/2 User's Guide, 1990, pp. 4-1 - 4-6) 


82 


a. Control Decomposition 

Control Decomposition is the method of dividing tasks or processes among the 
individual processors or nodes, a divide and conquer approach to the problem. This 
method is recommended for problems with irregular data structures or unpredictable 
control flows. 

One way control decomposition is implemented is by using a self-schedule 
approach. One node acts as the manager while the remaining nodes act as workers. The 
managing node maintains a list of processes to be accomplished by the working nodes 
and distributes these processes accordingly. The working nodes request jobs, receive 
processes, and perform the given task. Of course, in the self-scheduling method the cost 
is one processor, namely, the managing processor. (iPSC/2 User's Guide, 1990, p. 4-4) 

A second method of control decomposition involves the pre-scheduling of the 
processes. The exact tasks of each processor are explicitly stated in the parallel program. 
Although, this method saves the cost of one node, a great deal of care must be taken to 
ensure the processes are distributed as evenly as possible among the nodes. 

b. Domain Decomposition 

Domain Decomposition involves dividing the aot de or domain among the 
nodes. These partitioned sets may be specific data sets such as blocks of a matrix or may 
represent a specific grid such as used in finite difference or finite element methods to 
solve partial differential equations. Unlike control decomposition, domain 
decomposition requires that each node perform essentially the same tasks but with 
different input data. 

Domain decomposition is suggested if a program contains calculations based 
on a large data structure and if the amount of work is the same for each node. For 
example, the multiplication of two large matrices by block multiplication uses domain 


decomposition. Although domain decomposition may appear to be perfectly 
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parallelizable, the user must use caution to ensure each input set requires about the same 
amount of work. 
3. Improving Performance 

The parallelization of a problem may require the use of control decomposition, 
domain decomposition, or a combination of both methods to be an efficient algorithm. 
Once a particular method is chosen, several factors need be considered to improve the 
performance of the parallel algorithm. These factors include load balance, 
communication to computation ratio, and sequential bottlenecks. 

a. Load Balance 

Load balance refers to the degree to which all nodes are working or active. In 

the case where the work is not evenly distributed among the nodes, the parallel algorithm 
will be limited in its speedup ability. Ways to achieve load balancing is through a 
decrease in grain size of the parallel tasks, self-scheduling tasks, or redistributing the 
domain. Grain size refers to the relative amount of work completed in parallel. For 
example, distributing computations may be considered as large grain parallel computing 
while pipelined vector operations are small grain parallel computing. 

b. Communication to Computation Ratio 

Communication to computation ratio is the ratio of the time spent 

communicating to the time spent computing. The time loss for communications is 
unavoidable in parallel algorithms, except for perfectly parallel problems. A large 
communication to computation ratio restricts the performance of a parallel program. The 
objective is to maximize the time a node spends computing, thus minimizing the time it 
is communicating. Reducing the communication to computation ratio may be 
accomplished by increasing the grain size, grouping messages, or recalculating values 


vice receiving the value from another node. 


c. Sequential Bottlenecks 

In some cases a task cannot begin unless a previous task is completed, thus 
limiting the number of tasks that can be done in parallel. A sequential bottleneck is the 
situation where processors are waiting for another processor to complete a task before 
they may continue. The fraction of operations of a algorithm that cannot be done in 
parallel can greatly limit speedup as seen by Amdahl's Law (Equation 4.4). Sequential 
bottlenecks are potentially found with any requirements of the nodes to synchronize. The 
only way to remove sequential bottlenecks is to modify or restructure the algorithm in 


order to overlap Sequential code with other computations. 
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V. PARALLELIZATION OF SGP, SGP4, AND SDP4 


The objective of this thesis is to assess the parallel computing potential of the 
AFSPACECOM's satellite motion models SGP, SGP4, and SDP4. This analysis may be 
done by comparing the speedups and efficiencies of different parallel algorithms 
designed using the various methods and Strategies discussed in Chapter IV. Warren 
Phipps (1992) concluded in his thesis that vectorization and control decomposition were 
not beneficial in reducing the computation time of the NAVSPASUR model (PPT2), 
whereas the domain decomposition strategy showed promise. Since like the PPT2 
model, the SGP, SGP4, and SDP4 models are analytic and have computation times of the 
same order of magnitude, it was decided that this thesis would only investigate the 
domain decomposition strategy for all three models. 

The objective of the domain decomposition method is to the reduce the SGP, SGP4, 
and SDP4 models’ computation time by the concurrent computation of several satellite 
data sets. Each node of the hypercube performs identical tasks on different satellite data 
sets, at the same time. Thus, as was considered with PPT2, the ultimate goal is to reduce 


the overall computation time for several objects in orbit. 


A. ALGORITHM 

The parallel algorithm design used for SGP, SGP4, and SDP4 is identical to the 
parallel design used by Phipps (1992) to achieve the domain decomposition of PPT2. A 
single node is devoted to reading all the input data and distributing this data to the 
working nodes (i.e., those processors that actually propagate the given input data by 
either calling the SGP, SGP4, or SDP4 subroutine to produce a position and velocity 


vector for each time required beyond epoch), and a single node is to collect the results 
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and write them to an output file. Figure 5.1 illustrates how the satellite data is distributed 
for a three dimensional hypercube. 

In his thesis, Phipps presented the advantages and disadvantages of using this 
particular parallel algorithm design: 
advantages: 


* Relatively simple to apply. No requirement for communication or synchronization 
among the working nodes. 


* The existing subroutine (i.e., PPT2 for Phipp's thesis, and SGP, SGP4, or SDP4 for 
this thesis) may be used with only minor modifications. The same input parameters 
can be used for the parallel program as for the serial program. 


* The strategy to achieve maximum efficiency is reduced to developing an algorithm 
to distribute the data in a timely manner. Thus, decreasing the wait time for any one 
node. 


disadvantages: 


* Potential for sequential bottlenecks at input/output portions of algorithm. Reading 
and writing to an external file is extremely time consuming. With the iPSC/2 
hypercube used for this thesis and Phipp's thesis the input/output is completed 
sequentially. 


* The cost is the loss of two nodes, the input node and the output node. 
A complete listing of the source codes, and input/output parameters for the parallelized 
versions of SGP, SGP4, and SDP4 are contained in Appendix B. 

The serial algorithms used for this research for SGP, SGP4 and SDP4 are minor 
modifications of the original algorithms obtained from the AFSPACECOM. The 
modified serial algorithms read the entire batch of satellite data before propagating any 
of the input. For example, suppose 1000 satellites need to be propagated, the modified 
serial algorithm would read the input data for all 1000 satellites and proceed to propagate 


the position and velocity vectors one right after the other for each satellite. On the other 
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Figure 5.1 Distribution of Satellite Data 


hand, the original serial algorithms read in one satellite, propagate the data, read in the 
next satellite, and propagate its data, etc. This modification was performed to create 
more of a similarity between the serial algorithms and the parallel algorithms. It should 
be noted that the execution times of the modified serial algorithms are equivalent to the 
Original serial algorithms. 

The execution times for the serial and parallel programs for each of the models 
(SGP, SGP4, and SDP4) were obtained by using the internal clocks contained in each 
processor. For the parallel programs, the processor that depicted the largest amount of 
time for execution was used. Each execution time is the result of an average of ten 


recorded execution times. 
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1. Assessment of SGP 
a. Results 

The experimental results in the analysis of the parallelization of SGP depict 
similar trends and conclusions as obtained by Phipps (1992) in his analysis of PPT2. 
Figure 5.2 is a plot of the mean execution time for the serial version and the parallelized 
versions of SGP, using a four-node and an eight-node hypercube, versus the number of 
satellites propagated. See Appendix C for the execution times. As seen by Figure 5.2, 
the parallelized version of SGP was successful in reducing overall computation time. 
For convenience the serial version of SGP will be referred to as SSGP and the parallel 
version as PSGP. This notation will also be used in the assessment of SGP4 and SDP4. 
Table 5.1 gives the efficiencies and speedups for different numbers of satellites using 
four nodes and eight nodes. This table depicts a significant increase in efficiency using 
eight versus four nodes. This increase in efficiency presents the possibility of achieving 
even higher efficiencies with the use of PSGP for higher dimensional hypercubes. Table 
5.1 also indicates an approximate three to one ratio in the speedups obtained using eight 
nodes versus four nodes. This is expected since the eight node hypercube contains six 
“working” nodes, which is three times larger than the two "working" nodes used in the 
four node hypercube. 

Another trend observed in Table 5.1, more notably for the case with four 
nodes, is the slight increase in performance to a peak value and then slightly decreasing 
as the number of satellites increases. For this parallel algorithm the computation to 
communication ratio does not vary with the number of satellites. In agreement with 
Phipps (1992), the increase in performance must mainly be due to the lessening impact of 
the algorithm's overhead on total execution time. This overhead includes some small 


computations performed by each working node to determine the number of satellites it 
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Figure 5.2 SGP Execution Times 
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must receive from the distributing node. Since these computations are only completed 
once in the program, the time cost with these calculations become negligible as the 
number of satellites propagated increases. One possible phenomenon to offset this 
increase may have to do with the amount of message sets contained in the local buffer of 
a working node in terms of the time cost to manage these sets (i.e., each message set 
corresponds to a satellite), which may increase more than linearly with the number of 
Satellites. If a node is not ready to receive a message it stores it in a local buffer. As 
will be explained in the next section, the use of four and eight nodes result in the working 
nodes having message sets received from the distributing node stored in their local 
buffers. For a large number of satellites, several messages will be stored in each local 
buffer soon after the program is initiated. Although the time for a node to read from its 
local buffer is insignificant, the time involved in managing several storage sets may not 
be. The efficiency, in the case of eight nodes, peaks and levels out at .561 for 216 
satellites and decreases only slightly to .559 at 10000 satellites. The efficiency for four 
nodes peaks at .380 for 36 satellites and decreases to .372 at 10000 satellites. 
b. Improvements 

As stated in the previous section, PSGP may yield increased efficiencies and 
speedups if applied to a hypercube of greater dimension than three available on the 
iPSC/2 hypercube used for this thesis. Since the number of working nodes is not fixed 
for this algorithm, PSGP can be applied to any dimension hypercube without 
modifications. An increase in efficiency should occur with an increase in the hypercube 
dimension until the time to distribute a separate satellite data set to each working node 
exceeds the time required by the node to propagate a single satellite. Hence, the 
performance of the algorithm could improve by applying it to an optimal dimension 


hypercube. 
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Since the hypercube at the Naval Postgraduate School is limited to eight 
processors, a reasonably accurate model developed by Phipps for his analysis of PPT2 
was used to estimate the optimal hypercube dimension. The total execution time for 
PSGP to propagate n satellites with p processors, t(p), can be modeled by 

t(p)=t,,(p)+t,,(p) +t, (p) (all) 
where t¢,,(p) is the time the last node must wait to receive its first satellite data set, 
t_,(p) is the total time the last node must wait to receive all of its subsequent satellite 
data sets, and t, (p) is the time for each node to propagate its share of the 7 satellites. 

As discussed in the previous chapter, the iPSC/2 uses a Direct-Connect 
Module (DCM). Because of the DCM the startup time for a message to be passed 
between two nodes is essentially constant regardless of the length of the path the message 
is sent through. Thus, the‘time to send a message between two nodes is only a function 
of the size or number of bytes contained in the message. Since the size of all the 
messages between the distributing node and the working nodes are of the same size (104 
bytes), the time to send a single message between the distributing node and each working 
node is essentially constant. In this algorithm there are p—2 working nodes. Letting 
t, (1) represent the time to send a single message between the distributing node and a 
working node, t,, (p) may be modeled by 

ty, (P) =((p—2)— ty CD) = (p—3)ty (1). (5.2) 
To determine f¢,, (1), a small program was created and executed on the iPSC/2 hypercube 
at the Naval Postgraduate School. This program timed the message passing between two 


nodes for various size messages. For each case the time to send a single byte was 


calculated, and these values were averaged to equal 3.60x10~ ane This value is 
yte 


; M 
equivalent to Myf) ees which when rounded matches the hardware speed of 
Sec 
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2.8 EAE. quoted in the iPSC/2 technical summary (1988, p.11). Thus, the mean 
sec 


value of t,,(1) is approximately 
(3.60 x 10~ )(104) =.0374 msec. 

The total wait time for a working node to receive subsequent satellite data sets, 
t..(p), is a function of the elapsed time for the working node to propagate a single 
satellite data set, the elapsed time for the distributing node to send a subsequent satellite 
data set to the working node, and the number of satellites the working node must 
propagate. Since the distributing node distributes the data while the working nodes are 
computing, the wait time is zero if the subsequent satellite data arrives before the 
working node is ready to receive it. On the other hand, if the subsequent satellite data 
arrives after the node finished with the previous satellite data set, the wait ume is the 
difference between the elapsed time for the distributing node to send the node another 
satellite data set and the computing time for the previous satellite. Since the distributing 
node must send a satellite data set to each of the other nodes before sending a subsequent 
data set to the last node, the elapsed time for the distributing node to send another data 
set can also be modeled by ¢,, (p) in Equation 5.2. Thus, the total wait ime is the wait 
time for each subsequent satellite data set multiplied by the number of satellite data sets 


received by each working node. Letting the time to propagate a single satellite be 


represented by fl, t,, (p) may be modeled by 


0 if ty(p)<tl 


fa (P) = (2-1 Jla(e)=) if t,,(p)2tl. ee) 


The time, t1, was measured to be 4.60 milliseconds using SSGP. 
Assuming the time for one node to propagate n satellites, t(1), is 


t(1)=n(tl), 
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the total computation time for each working node, tf, (p), may be approximated by 


z n(tl) 


t.(p) = 


(5.4) 


Substituting Equation 5.1 into Equations 4.2 and 4.3, the speedup and 


efficiency using p processors can be modeled by 


_ t(1) 7 n(tl) 
p= ag Tae TE 
tp)  t, (p)+t,.(p) +t. (p) (5.5) 
S n(tl) 


EB 


P 


sae Ei = eae INES eee 
P Pt, (p)+t,.(p)+t,(p)] 


The total execution time, speedup, and efficiency (t(p), S,, and E,) were calculated 
using Equations 5.2 - 5.5 along with the following substitutions: 

* ty (1) =.0374 msecs 

* t1 = 4.60 msecs 

* n = 5000 satellites 
The number of satellites was chosen to be 5000 since this seemed to be a reasonable 
number for SGP's current use at various sensor sites. Figures 5.3 and 5.4 indicate the 
estimates of t(p), S,, and E, using 4 to 1024 processors (a cube dimension of 2 to 10). 
Using this model, PSGP is capable of achieving a maximum efficiency above .9 using a 
7 dimensional hypercube (128 nodes). These plots are only estimates of the true values 
of speedup and efficiency. Their prediction for the speedup and efficiency using 4 and 8 
nodes are slightly higher than the obtained experimental values, but most certainly 
provide a good indication of the parallel computing potential of PSGP for higher 
dimensional hypercubes. Notice for 4 and 8 nodes, t,,(p) equals zero resulting in stored 


messages in the local buffers of the working nodes. Since t,,(p) is much much less than 


n = 5000, f,, (1) =.0374 msec, 1 = 4.6 msec 
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Figure 5.3 Estimated Execution Time of PSGP for Various Hypercube Sizes 
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Figure 5.4 Estimated Speedup and Efficiency of PSGP for Various Hypercube Sizes 
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tl, several messages will be stored in the buffers, when propagating a large number of 
satellites, shortly after the program is initiated. 
2. Assessment of SGP4 
a. Results 

Like SGP, the parallelization of SGP4 also produced similar trends and 
conclusions as the parallelization of PPT2. Figure 5.5 is a plot of the mean execution 
time for the serial and the parallelized versions of SGP4, using a four-node and an eight- 
node hypercube, versus the number of satellites propagated. See Appendix C for the 
execution times. Table 5.2 gives the efficiencies and speedups for a various number of 
satellites using four and eight nodes. Again, an approximate three to one ratio exists 
between the values of speedup using eight nodes versus four nodes. Table 5.2 depicts the 
same trend as observed with PSGP. In the case of four nodes, the peak value occurs at 
36 satellites with an efficiency of .400. For eight nodes this occurs at S00 satellites with 
an efficiency of .595. Notice again an increase in efficiency as the number of nodes 
increased from four to eight, motivating the investigation of using hypercubes of higher 
dimension. 

b. Improvements 

The model used in the previous section to estimate an optimal dimension 
hypercube is also used here, since PSGP uses the domain decomposition parallel 
algorithm. The total execution time, speedup, and efficiency (t(p), Sp, and E,) were 
calculated using Equations 5.2-5.5 with the following substitutions: 


* t,,(1) = .0634 msec (the message size between the distributing node and the 
working nodes is 176 bytes). 


* tl = 6.60 msecs 


* n = 5950 satellites 
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Figure 5.5 SGP4 Execution Times 
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The number of satellites was chosen based on the information received from the 
AFSPACECOM. The AFSPACECOM in Colorado Springs propagates data on 
approximately 7000 satellites a day, and about 85 percent of those are low earth orbit, 
yielding the 5950 value. Figures 5.6 and 5.7 depict the estimates of t(p), Sp, and E, 
using 4 to 1024 processors. Using this model, PSGP4 is capable of achieving a 
maximum efficiency greater than .9 using a 6 dimensional hypercube (64 nodes). As 
with PSGP, the estimates using this model for PSGP4 predict a slightly better 
performance than actually achieved for a hypercube of 2 and 3 dimensions. 
3. Assessment of SDP4 

a. Results 

The parallelization of SDP4 was also successful, yielding similar results as the 
previous parallel algorithms. Figure 5.8 plots the mean execution time for the serial and 
the parallelized versions of SDP4, using a four-node and an eight-node hypercube, versus 
the number of satellites propagated. See Appendix C for the execution times. Table 5.3 
gives the efficiencies and speedups for different number of satellites using four nodes and 
eight nodes. As expected the speedups achieved using eight nodes is approximately three 
times greater than that for four nodes. Again, there are higher efficiencies obtained using 
eight nodes versus four nodes for any given number of satellites. As expected, peak 
efficiencies exist as a function of the number of satellites for both the four node and 
eight node case. In using eight nodes a peak efficiency of .644 is achieved when 
propagating 500 satellites. In using four nodes a peak efficiency of .433 occurs at 144 
satellites. 

b. Improvements 


Using the same model, t(p), S,, and E, were calculated using Equations 5.2 - 


5.5 with the following substitutions: 


n = 5950, ty (1) =.0634 msec, t1 = 6.6 msec 
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Figure 5.6 Estimated Execution Time of PSGP4 for Various Hypercube Sizes 
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Figure 5.7 Estimated Speedup and Efficiency of PSGP4 for Various Hypercube Sizes 
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Figure 5.8 SDP4 Execution Times 
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* t,,(1) =.0634 msec 

* t] = 10.8 msecs 

* n = 1050 satellites 
The value 1050 was chosen since approximately 15 percent of the 7000 satellites 
propagated a day at the AFSPACECOM are deep space. Figures 5.9 and 5.10 indicate 
the estimates of t(p), S,, and E, using 4 to 1024 processors. Using this model PSDP4 
is capable of achieving a maximum efficiency of about .9 using a 6 dimensional 
hypercube (64 nodes). 

4. Comparing the Parallel Algorithms for PPT2, SGP, SGP4, and SDP4 

Before comparing the four models, a few parameter values need to be presented 
from Phipp's assessment of PPT2. The time to propagate a single satellite, #1, using the 
parallel version of PPT2 is 11.2 milliseconds. The maximum efficiency achieved on an 8 
node hypercube was .67. Using 4 nodes, the maximum efficiency was .45. The number 
of satellites propagated ranged from 12 to 20736. Using the same model presented in 
this thesis, derived by Phipps (1992), to predict program performance versus hypercube 
dimension with the following substitutions: 

* t,,(1) =.693 msec 

* tl =11.2 msecs 

* n = 1728 satellites, 
the parallelized version of PPT2 is capable of achieving near .9 efficiency using a 4 
dimensional hypercube (16 nodes). 

One observation that can be obtained in comparing these four models, using the 
experimental data, is based on the execution time to propagate a single satellite, tl, and 
the resulting efficiencies for an eight-node and a four-node hypercube. Table 5.4 shows 
an increase in efficiency for both the eight-node and the four-node hypercube with an 


increase in satellite propagation time, tl. This conclusion is not surprising since an 
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Figure 5.11 Estimated Speedup and Efficiency of PSDP4 for Various Hypercube Sizes 
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VL FURTHER ANALYSIS OF PHIPPS' DOMAIN DECOMPOSITION MODEL 


The analysis in this chapter will only be concerned with SGP4 and SDP4 since these 
models are currently used operationally by the AFSPACECOM. Also, the resulting 
conclusions can be applied to SGP, since like SGP4 and SDP4, SGP is an analytic model 
with a comparable execution tme. In the previous chapter Phipps’ (1992) domain 
decomposition model was used to determine the optimal dimension hypercube for PSGP, 
PSGP4, and PSDP4. In this analysis the value of rl, the time to propagate a single 
satellite, represented only one call to the satellite motion subroutine resulting in one set 
of output for a specified time beyond epoch. This chapter will consider the effect of 
several calls to the subroutine per satellite, which is more realistic operationally. Each 
call corresponds to a specified time beyond epoch producing an output consisting of a 
position vector and velocity vector. 

Another factor this chapter will address is the effects of the input node and output 
node (i.e., the reading and writing portions of the program) on the program performance. 
The model developed by Phipps (1992) to assess the performance of his domain 
decomposition strategy did not include timing the input node or output node. The 
iPSC/2 hypercube at the Naval Postgraduate School is not capable of concurrent reading 
or writing from the disk by its processors. In other words, only one node can access the 
disk at any given time. The hardware, however, does exist to give the hypercube this 
capability. Assuming his domain decomposition strategy will eventually be applied to a 
hypercube with this hardware or possibly another type of parallel machine with 
concurrent disk access capabilities, Phipps chose not to investigate the effects of the input 


node and output node at this phase of his research. 


For the sake of completeness this chapter develops a model for Phipps’ domain 
decomposition strategy using the iPSC/2 hypercube at the Naval Postgraduate School 
which includes the effects of the input and output nodes. The model's validity is 
obtained by comparing some of its theoretical program times with the corresponding 
experimental times. It is assumed the hypercube has its current capability where only 
one node can access the disk at any given time. The objective in analyzing this new 
model is to measure the performance of the entire domain decomposition strategy 
developed by Phipps (1992) when applied to the iPSC/2 hypercube at the Naval Post 
Graduate School, and use this assessment to further improve the parallelization strategy 


for SGP4, SDP4, and PPT2. 


A. ASSESSMENT OF SEVERAL SUBROUTINE CALLS PER SATELLITE 
Here Phipps’ model will be used to determine the optimal dimension hypercube for 
SGP4 and SDP4 assuming a reasonable number of subroutine calls for each satellite 
motion model. The number of subroutine calls were chosen based on estimates received 
from the AFSPACECOM in Colorado Springs. 
1. Assessment of SGP4 
SGP4 propagates data for low earth satellites which require more frequent 
tracking than deep space satellites. Thus, a relatively large number of observations are 
received per day by the AFSPACECOM for each low earth satellite resulting in several 
calls to the SGP4 subroutine. Most likely the number of subroutine calls will fall 
between 50 and 100. An average value of 75 will be used in Phipps' model. 
All parameter values will remain the same as presented in the assessment of 
SGP4 in the previous chapter except for t1, the time to propagate a single satellite. Each 
time a new set of satellite data is received by SGP4 an initialization subroutine is called 


before the SGP4 main subroutine is called (i.e., the subroutine that produces the position 
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and velocity vectors for the specified time since epoch). For every other incremented 
time specified for the same satellite, the initialization program is not called. Thus, the 
execution times for these specified times are slightly less than the computation time it 
takes for the first time value specified. Let these shorter execution times which are of 
equal value be represented by the parameter ts, and let tf denote the first execution time. 
Let m represent the number of subroutine calls per satellite. Therefore, t1 is expressed 
by 
tl=t+(m-D)sts. (6.1) 
Thus, tl is calculated to be 
tl = 6.6+(74)(2.2) = 169.4 msec. 
Figure 6.1 depicts the speedup and efficiency versus hypercube dimension 
obtained from Phipps’ model and using the following substitutions: 
* t, (1) = .0634 msecs 
* tl = 169.4 msecs 
* n = 5950 satellites 
Since the number of satellites, m, and the time to send a single message from the input 
node to the output node, ¢,,(1), remains the same as for the m=1 case, a direct 
comparison can be made between the graphs in Figure 6.1 and Figure 5.7. Clearly, much 
higher speedups are obtainable for the m=75 case. If a large speedup is desired at the 
expense of efficiency, a maximum speedup near 1800 is achievable compared to a 
maximum speedup near 100 for the m=1 case. If maximum efficiency is desired the 
m=75 case depicts the potential to achieve near 100% efficiency using a hypercube of 
dimension eight (256 nodes) with a corresponding speedup near 250, compared to the 
m=] case where the maximum efficiency is also nearly 100% but corresponding to a 


speedup just over 60 using a hypercube of dimension six (64 nodes). The above 
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Figure 6.1 Estimated Speedup and Efficiency of PSGP4 for Various Hypercube Sizes 
assessment is expected since a higher m yields an increase in the computation to 
communication ratio. 
2. Assessment of SDP4 
SDP4 propagates data for deep space satellites where the subroutine calls are 
likely to fall between 1 and 50. An average value of 25 will be used in Phipps’ model. 
Thus, tl 1s calculated to be 
10.8+(24)(4) = 106.8 msecs. 

Using the above value for t1 and the same substitutions for ¢,,(1) and nm as for the m=] 
case, 

* ty, (1) =.0634 msecs 

* n = 1050 satellites, 
Figure 6.2 depict the speedup and efficiency. In comparing Figure 6.2 with Figure 5.11 
(the m=] case), much higher speedups are observed with the m=25 case yielding a 
maximum speedup near 650 compared to a maximum speedup near 150 for the m=] 
case. Considering maximum efficiency, the m=25 case has the potential to achieve near 
100% efficiency using a hypercube of dimension seven (128 nodes) with a corresponding 
speedup of about 125. Whereas for the m=1 case the maximum efficiency is achieved 
with a hypercube of dimension six (64 nodes) and a corresponding speedup near 60. 


Again, these results are expected due to the increased computation time with a higher m. 
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Figure 6.2 Estimated Speedup and Efficiency of PSDP4 for Various Hypercube Sizes 


Notice, the speedup achievable is not as high as for PSGP4, because the number of 
subroutine calls per satellite is one-third that of PSGP4 (i.e., 25 calls compared to 75 
calls). 


B. REVISING PHIPPS' MODEL TO INCLUDE READS AND WRITES 

Reading and writing to a disk is extremely costly in terms of computer time. As will 
be observed in assessing this revised model for SGP4 and SDP4, the initial access to the 
disk by the input node or output node produces a relatively large overhead for each of the 
programs. If the calculation portion of the satellite motion program isn't large enough in 
terms of computer time relative to the initial disk access time as well as the time to 
continue reading or writing from or to the disk after initial access, the parallel algorithm 
performance suffers greatly. This section of Chapter VI develops a model for Phipps’ 
domain decomposition strategy which includes the effects of the input and output node, 
assuming the use of the iPSC/2 hypercube where concurrent disk access is unavailable. 

1. Revised Model 

Since the output node is always the last node to finish in Phipps’ domain 


decomposition algorithm, the program total time, t(p), can be determined by the sum of 
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the output node's total wait time, mv, and the total time to write to the disk, ¢d. 
Therefore, the total ime, t(p), 1s represented by 
t(p)=twrtd. (6.2) 
The total time to write to the disk, td, can be modeled by 
td = td, + (n)(m) (td, ) (6.3) 
where 
td.= initial disk access time which includes time to write labels for output columns 


td_,= time to write data output (position and velocity vector) for each time specified 
(i.e., each time specified corresponds to a subroutine call) 


n = number of satellites 
m = number of subroutine calls per satellite. 
The total wait time, mw, can be broken in two areas, the initial wait time, tw,, the 
Output node must wait before it can initially access the disk (i.e., assuming the input node 
(node zero) accesses the disk first which has been observed to be more likely), and the 
total sum of the remaining wait times, tw,. The initial wait time, tw,, is the total time for 
all the satellites to be read in by node zero. Thus, tw, can be modeled by 
tw, = tr, +(n—1)r7, (6.4) 
where 


tr, = time for node Zero to initially access the disk which includes reading the first 
satellite data set 


tr, = time to read each satellite data set after the first one. 
The total sum of the remaining wait times, tw,, depends on the time to propagate a single 
Satellite, t1, by each of the working nodes and the number of working nodes. Since the 
initial disk access time, td., far exceeds tl for SGP4 and SDP4, even with considering 75 
subroutine calls for SGP4 and 25 subroutine calls for SDP4, there is no wait time by the 
output node between completing its initial disk access and writing its first set of output 


data. For all subsequent satellite data sets the wait time is also zero since 11, the time to 
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propagate the output for a single satellite, is found to be less than the time to write the 
satellite's output which may be modeled by 
(m) (td, ) (6.5) 
for each of the satellite programs SGP4 and SDP4. This also implies using any more 
than two working processors will not improve program performance, therefore p, the 
number of processors, will equal four for this model. 
Combining Equations 6.2 - 6.5 the total program time for PSGP, PSGP4, and 
PSDP4 can be modeled by 
t(4) = tr, + (n—1)er, + td, +(n)(m)td,. (6.6) 
Combining Equations 6.1 - 6.5, the time to execute SGP4 and SDP4 using one processor, 
t(1), is expressed by 
t(1) = tr, +(n— ler, +n[of +(m— Dts] + td, + (n)(m)td,. (6.7) 
Thus, using Equations 4.2 ,4.3, 6.6, and 6.7 the speedup and efficiency using four 


processors may be modeled by 


_ tl) _ wr, +(n—-l)er, +n[tf +(m—1)ts]+ td, + (n)(m)id, 
+" 1(4)_ tr. +(n—Ltr, +d, +(n)(m)td, 
(6.8) 
cS, += De, only pon lis] ia +n) (mia. 
4 A[tr, + (ner, + td, + (n)(m)td, ] 


E,= 

During the development of this model, consideration was given to the scenario in 

which the input node reads only one satellite data set (or a small portion of the entire 
batch of satellite data sets), distribute to a working node (or working nodes) and then 
read the next satellite data set (or next portion of satellite data sets), then distribute, etc., 
vice reading the entire batch of satellite data sets and then distributing. Since the 


restriction still exists where only one of the input or output nodes can access the disk at 


any given time while the other must wait, the results with this revised method of reading 
the disk would not present a significant difference in performance. 
a. Assessment of SGP4 
Table 6.1 gives values for speedup and efficiency for various n and m values 
(i.e., number of satellites and number of subroutine calls per satellite) using the revised 


model and the following substitutions obtained experimentally: 


* tr, =514 msecs * ts =2.2 msecs 
* tr, = 18.2 msecs * td, = 542 msecs 
* tf =6.6 msecs * td, = 8.85 msecs 


The single read and write times (tr, and td,) varied with the number of satellites and the 
number of subroutine calls per satellite (7 and m). The read time per satellite increased 
somewhat as the number of satellites, n, increased (values ranging from 16.4 to 22.7 
milliseconds), so an average of 18.2 milliseconds is used for tr,. The write time for each 
Subroutine call decreased somewhat as the number of subroutine calls per satellite, m, 
increased (values ranging from 12.1 down to 7.73 milliseconds), thus an average value of 
8.85 milliseconds is used for td,. Table 6.1 indicates minimal variance in speedup or 
efficiency with changes in m or n. 

The speedup and efficiency were obtained experimentally using four and eight 
nodes for the m=5 case for various numbers of satellites, and are given in Table 6.2. 
These actual efficiencies and speedups indicate that the estimated performance values 
listed in Table 6.1 are low, but still reasonable. Table 6.2 reinforces the observation that 
more than four nodes does not improve performance, in fact the contrary is true. 

The speedups and efficiencies depicted both theoretically and experimentally 


using four nodes are not bad considering there are only two working nodes. But, 
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TABLE 6.1 PSGP4 ESTIMATED PERFORMANCE 







Number of Subroutine Calls Per Satellite (m) 






TABLE 6.2 PSGP4 ACTUAL PERFORMANCE 


Number of Four Nodes Eight Nodes 





obviously, this configuration is not desirable for SGP4 since the performance is limited 


to what can be achieved using four nodes. 
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b. Assessment of SDP4 
Table 6.3 gives values for speedup and efficiency for various n and m values 


using the revised model with the following substitutions obtained experimentally: 


* tr. =514 msecs * ts =4 msecs 
* tr, = 18.2 msecs * td, = 542 msecs 
* tf = 10.8 msecs * td, = 8.85 msecs 


Again minimal variance in speedup or efficiency is observed with changes in m or n. 

The speedup and efficiency were obtained experimentally using four and eight 
nodes for the m=5 case for various numbers of satellites, and are given in Table 6.4. 
These actual efficiencies and speedups indicate again that the estimated performance 
values listed in Table 6.3 are low, but still reasonable. Table 6.4 also reinforces the 
observation that more than four nodes does not improve performance. 

The performance of SDP4 yields better results than SGP4 since SDP4 has a 
higher propagation time per satellite, but this configuration for SDP4 is still limiting 
since it is restricted to four nodes. 

c. Conclusion 

Applying Phipps’ domain decomposition strategy to SGP4, SDP4, and most 
certainly SGP (i.e., since SGP has an even smaller propagation time per satellite), on the 
iPSC/2 hypercube in the Mathematics Department at the Naval Postgraduate School 
which does not have concurrent disk access capability has limited performance 
capabilities (i.e., restricted to the use of four nodes). Most likely this conclusion will also 
hold for PPT2. But, there is great potential for Phipps' domain decomposition strategy if 
applied to a parallel computer that has the ability to perform concurrent disk access. The 
more processors that can access the disk simultaneously will likely result in more 


improved efficiencies and speedups. 
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TABLE 6.3 PSDP4 ESTIMATED PERFORMANCE 







Number of Subroutine Calls Per Satellite (7m) 







TABLE 6.4 PSDP4 ACTUAL PERFORMANCE 


Number of Four Nodes Eight Nodes 
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VIL CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH 


The overall objective of this thesis was to determine the parallel computing potential 
of the AFSPACECOM satellite motion models. The results given in Chapter V and 
Chapter VI indicate that Phipps’ domain decomposition strategy holds promise, 
especially if applied to a hypercube with concurrent disk access capability. This 
algorithm is simple to apply. It provides the flexibility to vary the dimension of the 
hypercube and makes modifications to the satellite motion models easy to implement. 

When omitting the effect of the input and output nodes, a maximum efficiency of 
only .56 was achieved for SGP, .60 for SGP4, and .64 for SDP4, but the potential 
efficiency is artificially bounded by the number of nodes available with the specific 
hypercube used. Having a maximum of only eight nodes available, which implies six 
working nodes, the efficiency of the domain decomposition algorithm is bounded above 
by .75. Using Phipps’ domain decomposition model, it was shown that a maximum 
efficiency near 100% could be achieved for the m=1 case (i.e., one subroutine call per 
satellite) with a hypercube of 128 nodes for PSGP and a hypercube of 64 nodes for 
PSGP4 and PSDP4. In comparison, the parallelized version of PPT2 achieves a 
maximum efficiency at 16 nodes. In the case of higher m values (25 for PSDP4 and 75 
for PSGP4) much higher speedups are obtainable yielding a maximum speedup of 1800 
for PSGP4 and 650 for PSDP4, but at lower efficiencies. Also, these higher m values 
produced maximum efficiencies near 100% with a hypercube of 256 nodes for PSGP4 
and a hypercube of 128 for PSDP4. 

Including the effects of the input and output nodes significantly limits the 
performance capabilities of PSGP, PSGP4, and PSDP4 when applied to a hypercube 


without concurrent disk access capability. Here it was discovered that the performance is 
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restricted to what can be achieved using four nodes, which creates an upper bound of .5 
for the efficiency corresponding to a speedup of two. Experimentally, a maximum 
efficiency of .41 was achieved with PSGP4 and .45 with PSDP4. 

In conclusion, applying Phipps’ domain decomposition strategy to a hypercube with 
concurrent disk access capability could result in speedup factors that would significantly 
reduce the time to predict state vectors for several thousand satellites. 

The success in applying Phipps’ domain decomposition strategy to SGP, SGP4, 
SDP4, and PPT2 using the iPSC/2 hypercube at the Naval Postgraduate School creates 
several possibilities for future research. One area of research would involve applying the 
parallelized versions of SGP, SGP4, SDP4, and PPT2 to higher dimension hypercubes 
that have concurrent disk access capabilities and validate the estimates presented by 
Phipps’ domain decomposition model, thus omitting the effects of the input and output 
nodes. Then investigate the performance obtainable using the concurrent disk access 
Capabilities which will be dependent upon the number of nodes that can access the disk 
simultaneously. 

Another possible area of research would involve modifying the current satellite 
motion models to increase the accuracy of their predictions. The results in Chapters V 
and VI showed an increase in performance if the amount of computation was increased. 
Thus, greater accuracy could be achieved in far less time using Phipps’ domain 
decomposition algorithm for SGP, SGP4, SDP4, and PPT2 than the time using the 
Original serial algorithms. Also, from these results, applying this algorithm to satellite 
motion models that are more computationally intensive, such as semi-analytic models, 
would yield greater parallel computing potential. 

Investigating other methods of parallelization presents another wide open area of 
research. One method of interest involves the use of Parallel Virtual Machines (PVM). 


This software provides a means to make a group of work stations act like a parallel 


computer. Parallelizing through PVM is appealing, since work stations have already 
proven to be quite invaluable for the day to day tasks in business and academic 
organizations due to their versatility, affordability, and performance capabilities, and 
having the added ability of parallel computing opens up new areas of computer power. 
With the growing pace in parallel computer technology, research in applying satellite 
propagation models to the more recently developed parallel computers can lead to 
tremendous breakthroughs. 

In conclusion, the results of Phipps’ (1992) thesis combined with the results from 
this thesis clearly shows that satellite position prediction can be made more timely 
through parallel computing. Although the best method of parallelization may vary 
depending on the specific model used as well as the type of parallel computer used, 
parallel computing presents one sure avenue to achieve timely satellite position 


prediction for the growing number of earth orbiting objects. 
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APPENDIX A 


INTEL iPSC/2 SPECIFICATIONS 


This appendix contains a summary of the iPSC/2 hypercube multcomputer 


specifications as described in (iPSC/2 User's Guide, 1990, pp. 1-1 - 1-11). The exact 


performance values were obtained from (Arshi, 1988, pp. 17-22). 


System Resource Manager (Host) 


Central Processing Unit 
Numeric Processing Unit 
Memory 

External Communication 


Operating System 


Nodes 
Node Processor 
Numeric Co-processor 


Operating System 


Internal Communication 


Direct-Connect™ Routing 


iPSC/2 


INTEL 80386 (4 MIP) 

INTEL 80387 (250 KFLOP 64-bit) 

8 Mbyte 

Ethernet TCP/IP local area network port 
AT&T Unix, Version V, Release 3.0 


INTEL 80386 (4 MIP) 
INTEL 80387 (250 KFLOP 64-bit) 
NX/2 


Message Latency -- 350 usec 


Message Bandwidth -- 2.8 Movies 
sec 


APPENDIX B 
SOURCE CODE LISTING 


This appendix contains a listing of the host and node programs for the parallelized 
version Of SGP, and SGP4/SDP4. Before each program set the input parameters are 
listed. Recall, SGP4 and SDP4 are two separate subroutines that are part of the same 
program. The host program for each satellite motion model loads the node program and 
clears the nodes once the process is complete. The node program contains the 
instructions for the nodes to complete their respective portions of the parallel algorithm. 
The node program assigns Node 0 to be the distributing node, the highest numbered node 
to be the collecting node, and the remaining nodes to be the working nodes. The 
working nodes execute the original SGP, or SGP4/SDP4 subroutine with only minor 


modifications. 


SGP 
Input Parameters 
Parameter Description 
lept End of file indicator 
ts Start me (first propagation time in minutes) 
tf Stop time (last propagation time in minutes) 
delt Time increment in minutes 


(determines number of propagation times) 








epoch Epoch reference time (year.day) 
xndt2o Time derivative of mean motion divided by two [ee 
ay 
xndd60 Second time derivative of mean motion divided by six [Res 
ay 
xincl Orbital inclination (degrees) 
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xnodeo 


eo Eccentricity (degrees) 
omegao Argument of perigee (degrees) 
xmo Mean anomaly (degrees) 
xno Mean motion (Rex 
Day 


parameter), this forms a message size, from input node to a working node, of 104 bytes. 


There are a total of 13 parameters. 


Host Program: 


* 4 & 


program PSGPh 


This host program loads the node program PSGPn on the 
nodes of the attached hypercube. Upon completion of the 
catalog of satellite data, the program clears the nodes for 


another process. 
Set host specific parameters 
data pid /0/ 
Set process id 
call setpid (pid) 
Load program PSGPn on the nodes 


print *, ‘loading nodes’ 
call load (psgpn’, -1, pid) 


Receive message that nodes are complete 


call crecv (99, istop, 4) 
print *, ‘nodes complete’ 


Kill process on nodes 


Call killcube (-1, -1) 


stop 
end 
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Right ascension of node (degrees) 


Using double precison (i.e., 8 bytes per 


Node Program: 
program PSGPn 


This program propagates n-2 satellites concurrently, where n is 
the number of nodes belonging to the attached cube. Node 0 is 
the distributing node and the Node n-1 is the collecting node. 
The remaining nodes are the working nodes that propagate the 
satellites using AFSPACECOM subroutine SGP. For simplicity, 
the tasks for all nodes are combined on this one node program. 
Tasks are partitioned by logical if statements. 


* + © &© © & & 


implicit real * 8 (a- b, 0 - z) 
real * 8 sat (13, 25000), f (13), 2(7* 541) 


integer hostid, pid, outlen 
common / sg / f (13), g (7 * 5 + 1), ig 


data inlen / 104 / 
data isat/1/,n/1/, istop/1/, pid/0/ 


maxlen = 8 * (7 * 5+ 1) 


*** WARNING 5 IN THE ARRAY G AND IN THE VALUE MAXLEN IS THE MAXIMUM 
*** NUMBER OF PROPAGATION TIMES FOR A GIVEN SET OF SATELLITE INPUT 


mynod = mynode( ) 
numnode = numnodes( ) 
hostid = myhost( ) 


*(00000000000000000000000000000000000000000000000000000000000000000000 
* Node 0 reads and distributes data among the working nodes 

if (mynod .eq. 0) then 
* Read complete catalog of satellite data 


open (30, file='satinp.I', status='old’, form='formatted') 
10 read (30, *) (sat (j, isat), j = 1, 4) 

if (sat (1, isat) .eq. 0.0) go to 25 

read (30, 15) (sat (j, isat), j = 5, 10) 
15 FORMAT (F15.8, F10.8, E9.5, F8.4, F9.4, F9.7) 

read (30, 20) (sat (j, isat), j = 11, 13) 

if (sat (13, isat) le. 0.0) go to 10 
20 FORMAT (F8.4, F9.4, F12.8) 


isat = isat + 1 


go to 10 
25 Close (unit = 30) 
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* Send number of data sets to all nodes 


isat = isat - 1 
call csend (mynod, isat, 4, -1, pid) 


* Distribute satellite data sets to working nodes 


itl = mclock( ) 
iter = isat / (numnode - 2) 
do 1201 j =1, iter 
do 1201 i= 1, numnode - 2 
call csend (mynod, sat (1, n), inlen, 1, pid) 


1201 n=n+1 


iter = mod (isat, numnode - 2) 
if (iter It. 1) go to 1203 
n=n-1 

do 1202 j =1, iter 


1202 call csend (mynod, sat (1, n + j), inlen, j, pid) 
1203 it2 = mclock () 


* End Node 0 





* Begin working nodes 


else 
if (mynod .lt. numnode - 1) then 


* Receive number of data sets from Node 0 and compute total number 
* of satellites to propagate 


call crecv (0, isat, 4) 

itl = mclock( ) 

if (mynod .le. mod (isat, numnode - 2)) then 
iter = isat / (numnode - 2) + 1 

else 
iter = isat / (numnode - 2) 

endif 


* Receive satellite data, execute SGP, and send results to 
* collecting node 


1220 


do 1220 i= 1, iter 

call crecv (0, f, inlen) 

call sgpm( ) 

outien = 8 * (7* ig + 1) 
call csend (mynod, g, outlen, numnode - 1, 0) 
it2 = mclock( ) 


dz 


* End Working Nodes 


* 


*CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
* Begin Collecting Node 
else 
open (unit = 11, file = 'velpos2') 
write (11, 1235) Tsince’, 'X’, 'Y’, 'Z’, "XDOT, "YDOT, "ZDOT’ 
1235 format (10x, a, 10x, a, 18x, a, 18x, a/ 26x, a, 15x, a, 15x,a) 
* Receive total number of satellite sets 
call crecv (0, isat, 4) 
* Collect results and write to external file from Working Nodes 
itl = mclock( ) 
do 1240 k = 1, isat 
call crecv (-1, g, maxlen) 
ig = g(1) 
write (unit = 11, 1245) (gi), i= 2+ G -2)* 7,8 + (G- 2) * 7), j=2, ig + 1) 
1240 continue 
it2 = mclock( ) 
1245 format (/ 4f17.8 / 15x, 3f17.8) 
close (unit = 11) 
* Send message to Host that process is complete 
call csend (99, istop, 4, hostid, pid) 
* End Collecting Node 


*Ccccccccccccccecccccccccccceccccccccccccccccccccccccccccccccccc 


endif 
endif 


itot = it2 - itl 
print *, 'node#', mynod, ' time ', itot,’ for’, isat, ’ sats’ 


stop 
end 
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SGP4/SDP4 
Input Parameters 
Here the input parameters are read in a little differently compared to SGP. The 
reading format is more compatible with the PC version of SGP4/SDP4 received from the 


AFSPACECOM. Also, this input is more complete in terms of object identification. 








Parameter Description 
icrdno Line number of data input 
satn Satellite number 
yt Year 
rday Day number 
xndot Time derivative of mean motion divided by two es 
xn2dt Second time derivative of mean motion divided by six ee j 
ie Exponent of xnd2dt 
bterm Modified ballistic coefficient (ER-') 
ie2 Exponent of bterm 
ephtyp Ephemeris type 
icrdno2 Line number of data input 
xinco Orbital inclination (degrees) 
xnodeo Right ascension of node (degrees) 
e€0 Eccentricity (degrees) 
omegao Argument of perigee (degrees) 
xmao Mean anomaly (degrees) 
xno Mean motion (S) 
Day 
iyr Year of start time 
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srday Day number of start time 
jyt Year of stop time 

spday Day number of stop time 
delta Time increment in minutes 


from input node to a working node, of 176 bytes. 


(Determines number of propagation times) 


There are a total of 22 parameters. Using double precison this forms a message size, 


Host Program: 


*# # & & 


program PSGP4h/PSDP4h 


This host program loads the node program PSGP4n on the 
nodes of the attached hypercube. Upon completion of the 
catalog of satellite data, the program clears the nodes for 


another process. 
Set host specific parameters 
data pid/0/ 
Set process id 
call setpid (pid) 
Load program PSGP4n on the nodes 


print *, ‘loading nodes’ 
call load (‘psgp4n’, -1, pid) 


Receive message that nodes are complete 


call crecv (99, istop, 4) 
print *, ‘nodes complete’ 


Kall process on nodes 


call killcube (-1, -1) 


stop 
end 
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Node Program: 
program PSGP4n 


This program propagates n-2 satellites concurrently, where n 1s 
the number of nodes belonging to the attached cube. Node 0 is 
the distributing node and the Node n-! is the collecting node. 
The remaining nodes are the working nodes that propagate the 
satellites using AFSPACECOM subroutine SGP4 for near-Earth 
or SDP4 for deep-space objects. For simplicity, the tasks for all 
nodes are combined on this one node program. Tasks are par- 
titioned by logical if statements. 


* && *& & & & & & 


implicit real * 8 (a-h, 0 - z) 
real * 8 sat (22), f (22), 2g (7 *5+ 1) 


integer hostid, pid, outlen 
common / sg / f (22), g (7 * 5 + 1), ig 
data inlen / 176 / 

data isat/1/,n/1/, istop/1/, pid/0/ 
maxlen = 8 * (7 *5 + 1) 


*** WARNING 5 IN THE ARRAY G AND IN THE VALUE MAXLEN IS THE MAXIMUM 
*** NUMBER OF PROPAGATION TIMES FOR A GIVEN SET OF SATELLITE INPUT 


mynod = mynode( ) 
numnode = numnodes( ) 
hosud = myhost( ) 





* Node 0 reads and distributes data among the working nodes 
if (mynod .eq. 0) then 
* Read complete catalog of satellite data 


open (10, file='satinp4.I’, status='old’, form="formatted’) 
10 read (10, *) (sat (j, isat), j = 1, 10) 

if (sat (1, isat) .eq. 0.0) go to 35 

read (10, *) (sat (j, isat), } = 11, 17) 

read (10, *) (sat (j, isat), j = 18, 22) 

isat = 1sat + 1 
35 _ close (unit = 10) 


* Send number of data sets to all nodes 
isat = isat - l 


call csend (mynod, isat, 4, -1, pid) 
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* Distribute satellite data sets to working nodes 


itl = mclock( ) 

iter = isat / (numnode - 2) 

do 1201 j =1, iter 

do 1201 i= 1, numnode - 2 
call csend (mynod, sat (1, n), inlen, 1, pid) 

1201 n=n+1 

iter = mod (isat, numnode - 2) 

if (iter It. 1) go to 1203 

n=n- 1 

do 1202 j =1, iter 
1202 call csend (mynod, sat (1, n + j), inten, j, pid) 
1203 it2 = mclock () 


* End Node 0 





* Begin working nodes 


else 
if (mynod .It. numnode - 1) then 


* Receive number of data sets from Node 0 and compute total number 
* of satellites to propagate 


call crecv (0, isat, 4) 

itl = mclock( ) 

if (mynod .le. mod (isat, numnode - 2)) then 
iter = isat / (numnode - 2) + 1 

else 
iter = isat / (numnode - 2) 

endif 


* Receive satellite data, execute SGP4/SDP4, and send results to 
* collecting node 


do 1220 i =1, iter 
call crecv (0, f, inlen) 
call sgp4m( ) 
outlen = 8 * (7* ig + 1) 
1220 _— call csend (mynod, g, outlen, numnode - 1, 0) 
it2 = mclock( ) 


* End Working Nodes 


= 


*CcCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
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* Begin Collecting Node 


else 
open (unit = 12, file = ‘velpos4’) 
write (11, 1235) 'Tsince’, 'X’, 'Y', 'Z', "XDOT, "YDOT, 'ZDOT 
1235 format (10x, a, 10x, a, 18x, a, 18x, a/ 26x, a, 15x, a, 15x,a) 


* Receive total number of satellite sets 
call crecv (0, isat, 4) 
* Collect results and write to external file from Working Nodes 
itl = mclock( ) 
do 1240 k = 1, isat 
call crecv (-1, g, maxlen) 
ig = g(1) 
write (unit = 12, 1245) ((g(i), i= 2+ G -2)* 7,8+(G-2)* 7),j=2, ig + 1) 
1240 continue 
it2 = mclock( ) 
1245 format (/ 4f17.8 / 15x, 3f17.8) 
close (unit = 11) 
* Send message to Host that process is complete 
call csend (99, istop, 4, hostid, pid) 
* End Collecting Node 
*CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 


endif 
endif 


itot = it2 - itl 
print *, 'node# ', mynod, ' time ‘, itot,' for‘, isat, ' sats’ 


stop 
end 
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APPENDIX C 


EXECUTION TIMES 


This appendix contains the experimental execution times for the parallel and serial 
versions of SGP, SGP4, and SDP4 for various numbers of satellites. The parallel 
execution times are for an eight-node and a four-node hypercube. There is only one 
subroutine call per satellite, and the time to read and write is not included (i.e., only the 


calculational portion 1s timed). 


SGP 
TABLE C.1 EXECUTION TIMES FOR PSGP AND SSGP 


PSGP 


_ 8 Nodes (msec) 4 Nodes (msec) _ 






Number 
of 


Satellites 
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SGP4 


TABLE C.2 EXECUTION TIMES FOR PSGP4 AND SSGP4 


Number PSGP4 
of 


_8 Nodes (msec) 4 Nodes (msec) _ 


21000 31100 
42400 62100 
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SDP4 
TABLE C.3 EXECUTION TIMES FOR PSDP4 AND SSDP4 


Number PSDP4 
of 


8 Nodes (msec) | 4 Nodes (msec) _ 














SSDP4 
(msec) 


a es 
_ ee ee 
re ee 
Pw | oss | mw tts 
pine | ts to | tg 


5000 10200 31000 50800 


000 moo | x20 | 102000 





62200 102000 
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